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A COMPARISON OF NORMAL AND 
ELLIPTICAL ESTIMATION METHODS IN 
STRUCTURAL EQUATION MODELS 



ABSTRACT 



Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean 
square error of approximation values using normal and elliptical estimation methods. Three research 
conditions were imposed on the simulated data: sample size, population contamination percent, and 
kurtosis. A Bentler-Weeks stmctural model established the relationship between the sample variance- 
covariance matrix and the specified population model. The elliptical generalized least squares 
estimation method provided die better chi-square results in the presence of kurtosis. The parameter 
estimates were similar across research conditions for both the normal and elliptical estimation methods. 
The root mean square error of approximation values were robust in the presence of kurtosis for the 
elliptical estimation methods. The root mean square error of approximation is therefore the preferred 
inferential approach to assessing model fit in the presence of kurtosis because of known distributional 
properties and determination of confidence intervals for hypothesis testing. 



A COMPARISON OF NORMAL AND 
ELLIPTICAL ESTIMATION METHODS IN 
STRUCTURAL EQUATION MODELS 



Some type of estimation method is used in aU parametric statistics, e.g., regression analysis, 
factor analysis, discriminant analysis, and canonical correlation analysis (Ferguson & Takane, 1989). 
The various estimation methods are used to derive sample estimates of population parameters 
(Marcoulides & Hershberger, 1997). The estimation methods however produce different results 
depending upon assumptions made by the researcher. In stractural equation modeling, various normal 
and elliptical estimation methods can be used to estimate population parameters from sample data. 
Least squares (LS), generalized least squares (GLS), and maximum likelihood (ML) estimation 
procedures assume a normal distribution (BoUen, 1989). Elliptical LS (ELS), Elliptical GLS (EGLS), 
and Elliptical re-weighted least squares (ERLS) procedures assume an elliptical distribution (Bender, 
1992). 



RELATED RESEARCH LITERATURE 



In practice, one typically does not know the population variance-covariance and the 
population parameter(s). Hence, an estimation method is used to obtain sample estimates of the 
unknown population parameter(s) based on the sample variance-covariance matrix. Once sample 
parameter estimates are derived, one can compute the model implied sample variance-covariance 
matrix, Z . Sample parameter estimates are derived such that Z is as close to S as possible. The 
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difference between S and I is typically indicated by a chi-square statistic, although the root mean 
square of approximation is also recommended (Schumacher & Lomax, 1996). Obviously, if S - Z = 
0 , then the sample parameter estimates derived from the estimation method perfectly reflect the 
population parameters based on the fit function, F( S, S(0)), and chi-square equals zero. 

NORMAL DISTRIBUTION TflEORY 



The normal distribution with certain statistical assumptions has played a fundamental role in 
multivariate statistical analysis (Muirhead, 1982). A sufficient condition for the underlying normal 
distribution assumption to hold is that the observed variables do not have excessive kurtosis. Basically, 
the kurtosis of each observed variable should equal zero, which is the kurtosis of a normal distribution 
(Bollen, 1989; Browne, 1974). In stmctural equation modeling, several normal estimation methods are 
available depending upon the fit function. 

The least squares estimation method (LS) which assumes multivariate normal distributed 
variables minimizes the following fit function; Fls = .5 tr [(S - S)^] where the degrees of freedom are: 
df = .5 (p + q)(p + q + 1) - 1, and t = the number of independent parameters to be estimated, n = the 
number of observations or sample size, (p + q) = the number of observed variables analyzed, and tr = 
the trace or diagonal sum of the matrix elements (Schumacker & Lomax, 1996). The fit function is 
equal to (n - 1) Fls , which yields a chi-square statistic. The generalized least squares estimation 
method (GLS ) yields the following fit function; Fgls = • 5 tr[(S - S) S ']^, where S ' is a positive 
definite weight matrix of residuals derived from differences in the matrix elements( i.e., S - S). The 
default estimation method in most computer programs is the maximum likelihood estimation method. 
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which can be derived by assuming that the observed variables are multivariate normal distributed. The 
ML parameter estimates are obtained by riiiriirnizing the following discrepancy function: = tr (S 

22 ') - (p + q) + In I S I - In I S |. If the covariance matrix, S, is close to the predicted population 
matrix, S, then the sample data fits the model, and F^l approaches zero ( i.e., if Z = S, then In | Z | - In 
I S I 0 ). Likewise, if Z = S, then the trace or sum of the diagonals will be approximately equal to (p 
+ q), the number of observed variables analyzed, and the value of tr (S S') - (p + q) will approach 
zero. In large samples and under specific conditions ( Browne, 1974, 1984; Joreskog, 1967 ), (n -1 
)Fml ~ , where (p* - q) and p* = p(p+l)/2 are the degrees of freedom and q is the number of 

parameters to be estimated. Therefore, the ML fit function yields a chi-square statistic. 

The multivariate normal distribution of z variables has a mean vector, |X, and a covariance 
matrix, S, described by the density function: 



Y = 









where Y = height of the normal curve for z variables, Tt = a constant 3.1416, and e = base of 

Napierian logarithm = 2.7183 (Ferguson & Takane, 1 989). Standard score variables have a mean = 
0 and a standard deviation = 1 , so = 0 and 0=1. The area under the normal distribution is unity 
(see Figure 1). 



Insert Figure 1 Here 
(Normal Distribution) 
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A general formula to derive sample parameter estimates in a structural equation model 
given the normal distribution assumption is (Bentler, 1992): 

Q„^r'tr[(S-Z)W,]\ 

The weight matrix, denoted as W 2 in this general formula, is replaced by any of the three normal theory 
estimators of S'' : 

(a) W 2 = I (identity matrix) gives normal least squares(LS) 

(b) W 2 = S'' gives normal generalized least squares(GLS) 

(c) W 2 = X ' gives normal re-weighted least squares(ML). 

ELLIPTICAL DISTRIBUTION THEORY 

Elliptical distributions are based on a broad class of distributions that include both heavy and hght 

tailed symmetric distributions relative to 
normal distribution. The 

characteristic Ilinction of an elliptical distribution for some Ilinction ij; ( Muirhead,1982) is of the form: 




Form >2, Berkane and Bentler (1986) defined 
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K{m) + l=-^ 

{y/ (0)) where and (0), respectively, are 

them'*’ and the first derivative oft};, evaluated at zero. Assume [1=0 without loss of generality, and [Iiii 2 ...i 2 m 
= EpCj, Xi 2 ... Xi 2 m), Berkane and Bentler (1986) showed that, if i, = i 2 =... =i 2 m= I, then: 



T his relationship characterizes the eUiptical distribution, i.e., if a random variable y has density ^ (y), if all 
odd moments are zero, and if the (2m)"’ moment exists and is defined by [J. 2 m C(m)([l 2 )"', for some 

constrained C depending on m, then y is elliptically distributed. 

The multivariate eUiptical distribution ofy variables has a mean vector, [l, and a covariance matrix, 
S, described by the foUowing density function (Bentler, 1992): 



,2m 






(2m)! 

2"' ml 



(ic(m) + l)(Ai/^0 



(2)\m 



where kj and k 2 are constants and g 
is a non-negative function. This 




density function yields an elliptical 
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distribution. The y variables have a common kurtosis parameter of: 



which describes the tails of the distribution relative to the multivariate normal distribution. The multivariate 
normal distribution is therefore a special case of the multivariate eUiptical distribution when K = 0. Values 
for the parameter K, other than 0 (zero), characterize elliptical distributions (Berkane & Bentler, 1987a; 
1987b). 



Insert Figure 2 Here 
(Elliptical Distribution) 



A general formula to derive sample parameter estimates given an eUiptical distribution assumption 
(Bentler, 1992) is: 






(it+ 1) ^tr 



(S-S)W^\ -d\tr(S-S)W^ 



The weight matrix, denoted as W 2 in this general formula is replaced by any of three eUiptical 
estimators of S ' : 

(a) W 2 = I (identity matrix) gives eUiptical least squares(ELS) estimates; 

(b) W 2 = S ' (fixed) gives eUiptical generalized least squares(EGLS) estimates; 
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(c) W 2 = X ■' (iteratively updated) gives elliptical re-weighted least squares(ERLS) estimates. 



The Mardia-based K coefficient (Mardia, 1970; 1974) can be used in computing eUiptical 
computations (Bentler, 1992). The default computation of K (Shapiro & Browne, 1987) is given by: 

k 

' Pip + 2)’ 



where, 



1 V ^ 



S2., = N-'^ 
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(z,-zys-\z,-z) 



-p(p + 2) 



is the deviation from the expected multivariate Mardia-based K kurtosis value. The z notation references 
raw score and mean vectors, respectively. The normalized (standard score) estimate is given by: 

Si,p 

(8p(p+ 2)/^)"^ ’ 

which, in large samples, operates the same as the unit normal variate in the normal distribution. The 
normalized estimate can be used to test the null hypothesis of multivariate normality. 

The relative merits of alternative estimators of K has not yet been established (Bentler, 1992). 

In non-elliptical populations, these estimators do not necessarily converge. The Mardia-based K 
coefficient, however, does have asymptotic expectation and variance, such that: 

E{k^) = k. 
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The use of normal or elliptical distributions in stmctural equation modeling is based on theoretical 
considerations. It is possible that failures of normal or elliptical estimation methods can be associated 
with the estimation of K (Tyler, 1982 & 1983). In most estimation methods, however, an assumption 
underlying the fit function is that the variables have some particular multivariate distribution, either 
normal or elliptical. Consequently, the chi-square ()(^) test is used as a goodness-of-fit test (fit 
function) between S and 2 , given optimal sample weight estimates. 

CHI-SQUARE, PARAMETER ESTIMATES AND KURTOSIS 



Chi-Souare 

A number of studies have investigated the chi-square statistic in normal and non-normal data 
samples. In non-normal samples containing kurtosis, the chi-square statistic based on the ML 
estimation method was too large, causing the rejection of a true stmctural equation model too often ( 
Bender, 1992; Harlow and Newcomb, 1984; La Du and Tanaka, 1989; Muthen and Kaplan, 1985; 
Tanaka, 1984). In studies using ML estimation with normal samples, the chi-square statistic had tittle 
bias with samples ranging fixam n > 30 (Geweke & Singleton, 1980), to n = 200 (Boomsma, 1983), to 
n= 500 (Browne, 1982, 1984), to n= 1000 (Muthen & Kaplan, 1985). Wang, Fan, and Willson 
(1996) explained that the adjusted chi-square test (Satorra-Bentler re-scaled chi-square) reported in 
the presence of elliptical distributed data can provide acceptable conclusions given an appropriate 
sample size that balances the statistical power of the test with sampling variation. Hoogland and 
Boomsma (1998) suggested that the ML chi-square statistic often rejected the tme model when the 
sample size was smaller than five times the number of degrees of fi:eedom of the model. When the 
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observed variables had an average positive kurtosis as large as 5.0, the sample size may have to be 
increased by up to 10 times the size of the model. Given that the model is appropriate, the GLS chi- 
square statistic may have an acceptable performance for a sample size that is two times smaller than the 
sample size needed for an acceptable performance of the ML chi-square statistic. 

Weng and Cheng (1997) recommended that although chi-square values given by ML, LS, and 
GLS estimators diifer, the eifects of this discrepancy on relative fit indices may dimiiush as sample size 
increases. For example, if a model fits the data and the sample size is very large, ML and GLS 
estimation methods yield a very similar chi-square statistic (Browne, 1974). 

Parameter Estimates 

The eifects of various estimation methods on the parameter estimates in stmctural equation 
models has also been studied. Harlow(1985) concluded that ML and ERLS parameter estimates were 
comparable in a Monte Carlo factor analysis simulation study. Muthen and Kaplan(1985) found no 
diiference between parameter estimates using the ML and GLS estimation methods. Henly(1993) 
pointed out a striking similarity between ML and GLS estimates. Wang, Fan, & Willson(1996) also 
found the results Ifom the ML and GLS methods to be practically identical, except for some 
insignificant differences. 

Boomsma(1983) in a Monte Carlo study using ML estimation with normal continuous data, 
found that “Generally for N >200 there is little bias in estimating parameters.. .”(p.l 16). Boomsma also 
examined categorical, skewed, and kurtotic data, and he concluded that parameter estimates were 
unbiased for N = 400 using the ML estimation method. Boomsma’s findings were supported in a 
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Monte Carlo study by Muthen and Kaplan(1985) which studied estimates based on ordered 
categorical data using ML, GLS and ADF estimation methods. Muthen and Kaplan found that ML, 
GLS, and ADF methods were unbiased when using a sample size of 1000. Browne (1982, 1984) 
conducted a Monte Carlo study of ML and ADF estimation in both normal and non-normal continuous 
data with N = 400 and N = 500. Browne further suggested that parameter estimates were unbiased 
when using ML estimation in normal samples. 

Hoogland and Boomsma (1998) found that the bias of ML parameter estimates increased when 
the level of univariate skewness and kurtosis deviated increasingly fiom normal theory values. 

Hoogland & Boomsma also suggested that a larger sample size^ > 500) was a remedy for obtaining 
unbiased parameter estimates. Wang, Fan, and Willson(1996) concluded that population parameter 
mean estimates across 100 replications approached the population values as the sample size increased 
from 200 to 1000. The differences between the minimum and maximum parameter estimates 
decreased remarkably with an increased sample size. The quality of parameter estimates was not of 
much concern even with non-normal data, provided that appropriately large samples were used. 

Wang, Fan, and Willson also found that the parameter estimates appeared to stabilize when the sample 
size reached 500. Weng and Cheng(1997) compared the three normal theory estimators and found 
that ML and LS estimation methods yielded identical parameter estimates, which were shghtly different 
from GLS estimates. 

Kurtosis 

A number of studies have examined the impact of kurtosis in non-normal data. Browne (1982, 
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1984) developed an asymptotic distribution free index which permitted the use of a generalized least 
squares estimator even when the variables exhibited excessive kurtosis (“peakness”) or insignificant 
kurtosis(“flatness”) in the multivariate normal distribution. Social scientists frequently are concerned 
about the skewness in their data; however, Browne indicated that it is kurtosis, not skewness, that was 
critical because kurtosis is a term in the mathematical expression for the covariances. That is, when 
data are not normally distributed, the researcher must know about the variables kurtosis, as well as the 
variable means and covariances, in order to make inferences about individual patterns of scores. 

Harlow (1985) studied elliptical distributed data in factor analysis and found the ERLS ( 
Elliptical reweighted least squares) estimation method performed the best under various levels of 
kurtosis (K > 0). Hoogland and Boomsma (1998) further concluded that the bias in parameter 
estimates increased when the absolute value of kurtosis increased. They discovered a remarkable 
effect on the sign of the kurtosis, namely, the bias of ML estimates is positive for platykurtic 
distributions and negative for leptokurtic distributions. Bias becomes most extreme when the underlying 
distribution is highly leptokurtic. 

The elliptical distribution differs from the normal distribution based on kurtosis in the sample 
data. One would therefore expect the chi-square statistics, parameter estimates, and root mean square 
error of approximation values to differ when comparing results from these two distributions. It is 
anticipated that, normal estimation methods in stmctural equation modeling would yield biased results 
when using non-normal sample data. Moreover, elliptical estimation methods should out perform 
normal estimation methods given elliptical data distributions. 
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METHODS AND PROCEDURES 

The EQS 5.7 software program (see appendix) permitted the specification of different population 
data contamination percentages ( .05 and .10), sample sizes ( 1 000, 5000, 1 0000), and kurtoses (1,2,3), 
which followed suggestions by Mattson (1997) and Mooney ( 1 997). This yielded a 2 X 3 X 3 design with 
18 unique research conditions. The EQS 5.7 program generated a sampling distribution based on 100 
rephcations of these conditions. The fit function (X^), stmctural coefficient (y), and root mean square error 
of approximation values were saved in separate files and compared in tables across these research 
conditions. 

Simulated Data Sets 



The EQS 5.7 software program (Bender & Wu, 1995) was used to generate pseudo-random 
samples of data to compute the sample variance-covariance matrix. Previous research by Bang and 
Schumacker (1998) has indicated that pseudo-random number generators don’t produce normal 
distributions of data with sample sizes less than 10,000. Three sample sizes of 1000, 5000, and 10000 
were chosen for the study to reflect this lack of normality in pseudo-random number generators when 
comparing the estimation methods. 

Non-normal distributions were created by generating a normal distribution with [X and S, and 
adding a smaller percent non-normal distribution with the same [X, but with a variance-covariance equal 
to K S. The scale factor, K, which creates the non-normal population, ranges between 1 and 10. 
The present study used values of K = 1 , 2, or 3, because the use of values greater than 3 generated 
elhptical data which failed to converge using either normal or elhptical estimation methods in the study. 
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The smaller percent non-normal distribution which is added to a normal distribution can range between 
0% and 100%, but is usually 10% or less; 5% and 10% were used in the study. Stmctural equation 
modeling estimates are typically asymptotic, meaning that they approach the tme population value as 
sample size increases. These sample sizes are therefore suitable, especially since several researchers 
(e.g., Bentler,1992; Browne, 1982,1984) have suggested that larger sample sizes may be needed when 

f 

estimation methods are based on fourth-order moments (kxirtosis). 



Stmctural Model 

Gerbing and Anderson (1992, 1993) suggested that using substantively meaningfiil models in 
Monte Carlo simulation may increase our understanding of the results and that most simulation studies in 
stmctural equation modeling have used from two to six latent variables, with two to six indicators for 
each latent variable. In this study, a specific population model was simulated based on the Bentler- 
Weeks (Bender & Weeks, 1980) stmctural equation model (see Figure 3). 

The number of distinct values in the sample variance-covariance matrix is ten (10). This can be 
calculated as: .5(p + q)(p + q+ l), where p = the number of dependent variables and q = the number 
of independent variables. The degrees of freedom for the chi-square statistic is calculated as the number 
of distinct values in the sample vari^ce-covariance matrix minus the number of parameters to be 
estimated. Since there are ten distinct values in the sample variance-covariance matrix and six 
parameters to be estimated in the model (four E’s, D2, and y)> the degrees of freedom is equal to four. 



er|c 



16 



14 



Insert Figure 3 Here 
(Bentler-Weeks Model) 



The Bentler-Weeks structural model is specified in the EQS 5.7 software program using the 
“/EQUATION” command to generate the population variance-covariance matrix (see appendix). The 
EQS program “/EQUATION” command specifies fixed factor loadings of .80 (vahdity coefficients) for 
the observed variables that identify both the exogenous factor, FI, and the endogenous factor, F2. The 
“/EQUATION” command further indicates that VI and V2 are two observed variables that are 
indicator (manifest) variables of FI (exogenous factor) and that V3 and V4 are two observed variables 
that are indicator (manifest) variables of F2 (endogenous factor). A stmctural coefficient indicates that 
FI predicts F2. The following set of “/EQUATION” command lines indicate the Bentler-Weeks 
stmctural equation model in the program: 



/EQUATIONS 
VI = .8*F1 +E1; 
V2 = .8*F1 +E2; 
V3 = .8*F2 + E3; 
V4 = .8*F2 + E4; 
F2= *F1+D2; 



where VI -V4 are observed variables, E1-E4 are measurement errors of the observed variables, FI and 
F2 are factors (latent variables), and D2 is the error of prediction for F2. 



Data Analysis 
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The EQS 5.7 software program (Bentler & Wu, 1995) was used to simulate normal and 
elliptical distributions of data (see Figures 1 and 2) and estimate chi-square, structural coefficient, and 
root mean square error of approximation values for 1 8 unique research conditions based on sample size, 
population contamination percent, and kurtosis. The EQS 5.7 software program is annotated to indicate 
which command lines were changed for each of the research conditions. For example, “CASES” was 
used to specify the different sample sizes, “METHODS” was used to specify pairs of normal and 
elliptical estimation methods, and “CONTAMINATION” was used to indicate the smaller percent non- 
normal distribution and kurtosis factor. A sampling distribution based on 100 replications using a 
pseudo-random number generator with different seed values produced a point estimate for chi-square, 
parameter, and root mean square error of approximation values. The EQS 5.7 software program 
provided the necessary summary statistics. 

The model chi-square values can be compared against a critical chi-square value of 9.488 at the 
.05 level of statistical significance for four degrees of freedom and a root mean square error of 
approximation value equal to or less than .05, implying a close fit. Kurtosis values should be greater 
than the value of k > -2/(p+2), where p is the number of measured variables (Bentler and 
Berkane,1986; Tyler,1982). Given 4 measured variables, k > -.25. The user should be aware that the 
application of elliptical distributions to stmctural equation modeling is based on theoretical 
considerations. There is little experience that can be used to provide guidance on how to avoid 
breakdowns in the method, i.e., misleading results. It is possible that potential failures of elliptical 
estimation methods can be associated with poor estimation of k, hence poor estimation of the sample 




variance-covariance matrix. 



16 



Monte Carlo simulations were conducted based on generating data from a known population 
model, then estimating this tme population model under different research conditions. Consequently, 
power determination was not required in the study. In practice, testing a null hypothesis of model fit 
requires power and sample size considerations. Schumacker and Lomax (1996) and MacCallum, 
Browne, and Sugawara (1996) provide programs and recommendations for power calculations and 
sample size. For example, the Hoelter critical N, which is CN = (X^/F)+1> gives the sample size at 
which F would lead to a rejection of the null hypothesis. Their programs also use modification index 
values and root-mean-square error of approximation (RMSEA) values. The RMSEA values, together 
with the degrees of freedom (df) for the model, the sample size (n), and Type I error rate (alpha) are 
used to calculate power. RMSEA <= .05 are considered a 'close fit'; values between .05-.08 are 
considered 'fair fit', between .08-. 10, 'mediocre fit', and RMSEA > .10, 'poor fit'. 



RESULTS 

The chi-square values at k = 1 for both normal and elliptical estimation methods yielded similar 
results across the research conditions. These findings were expected because only sample size effects 
were present, with percent contamination having no impact. The results more clearly reflect the outcome 
of data generated using a pseudo-random number generator (An average chi-square value of 3.84 was 
obtained from the sampling distribution based on 100 replications using a normal distribution with 
sample sizes greater than 10,000). The stmctural coefficients were similar for both normal and elliptical 
estimation methods across the research conditions. The root mean square error of approximation 
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(RMSEA) was robust across the research conditions for all estimation methods, except under extreme 
levels of contamination (10%) and kurtosis (k=3). 

Least Squares Estimation 

The normal least squares (LS) and elliptical least squares (ELS) estimation methods are 
compared in Tables 1 to 6. As the percent non-normal data and kurtosis increased, the chi-square 
values increased, but the elliptical least squares estimation method computed lower chi-square values. 
The stracture coefficients and root mean square error of approximation values (RMSEA) remained 
similar, but were more distorted under conditions of extreme percent contamination (10%) and kurtosis 
(k=3). The least squares estimation method failed to yield a solution (lacked convergence) under these 
conditions, returning fewer than the required 100 replications. 

Insert Tables 1 to 6 Here 
LS/ELS Tables 

Generalized Lea.st Squares Estimation 

The normal generalized least squares (GLS) and elliptical generalized least squares (EGLS) 
estimation methods are compared in Tables 7 to 12. As the percent non-normal data and kurtosis 
increased, the chi-square values increased, but the elliptical generalized least squares estimation method 
computed lower chi-square values. The stracture coefficients and root mean square error of 
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approximation values (RMSEA) remained similar across research conditions and were more robust 
under conditions of extreme percent contamination (10%) and kurtosis (k=3) than the previous least 
squares estimation methods. The elliptical generalized least squares estimation methods also performed 
better under these extreme conditions and returned the required 100 replications, except for 
percent=10%, n=1000, k=3. 



Insert Tables 7 to 12 Here 
GLS/EGLS Tables 



Maximum Likelihood Estimation 

The maximum likelihood (ML) and elliptical re-weighted least squares (ERLS) estimation 
methods are compared in Tables 13 to 18. As the percent non-normal data and kurtosis increased, the 
chi-square values increased, but the elhptical re-weighted least squares estimation method computed 
lower chi-square values. The stmcture coefficients and root mean square error of approximation 
(RMSEA) values remained similar across research conditions and were similar to results obtained using 
the least squares estimation methods. The eUiptical re-weighted least squares estimation method 
however performed better under extreme conditions and returned the required 100 rephcations, except 
for percent = 10%, n=l'000, k=3. 



Insert Tables 13 to 18 Here 
ML/ERLS Tables 
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CONCLUSIONS AND RECOMMENDATIONS 



The elliptical estimation methods performed better overall than the normal estimation methods in 
the presence of increasing contamination and kurtosis, e.g., the normal least squares (LS) estimation 
method failed to reach a solution (lacked convergence) under increased kurtosis. The elliptical 
generalized least squares (EGLS) estimation method overall performed better than the other estimation 
methods in computing chi-square, stmcture coefficient, and root mean square error of approximation 
values under increasing contamination and kurtosis. Previous findings by Bentler ( 1 983a), Harlow and 
Newcomb ( 1 984), Muthen and Kaplan ( 1 985), and Tanaka (1984) which indicated that ML chi-square 
estimates were too large, causing the rejection of a hue stmctural equation model too often, was 
supported in the study. The tendency for increased levels of kurtosis to affect elliptical estimated chi- 
square statistics, as reported by Harlow (1985), was also substantiated in the present study. In 
contrast, the findings by Weng and Cheng (1997) that chi-square values computed by LS, GLS, and 
ML estimators differ, but the effects diminish as sample size increased was not supported, especially 
under increased kurtosis in this study. 

The effects of various estimation methods on the parameter estimate in the stmctural equation 
model was found to be rniriimal. This was supported by Harlow(1985), who concluded that ML and 
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ERLS parameter estimates were comparable in a Monte Carlo simulation study; Muthen and 
Kaplan(1985), who found no difference between parameter estimates using ML and GLS estimation 
methods; Henly(1993), who pointed out a striking similarity between ML and GLS estimates; and 
Wang, Fan, & Willson(1996) who also found the results from ML and GLS estimation methods to be 
practically identical as sample size increased. 

The root mean square error of approximation (RMSEA) was robust across the research 
conditions and estimation methods. The root mean square error of approximation values were especially 
robust in the presence of kurtosis using the elhptical estimation methods. The root mean square error of 
approximation is therefore the preferred inferential approach to assessing model fit because of known 
distributional properties and determination of confidence intervals for hypothesis testing. 

In practice, researchers are often confronted with non-normal data, i.e., skewness and kurtosis. 
Recommendations based in part on the findings in this study and related research indicate several 
suggestions. First, determine the sample size and power needed to conduct a test of the structural model 
using programs by MacCallum, Browne, and Sugawara (1996) and/or Schumacker & Lomax (1996). 
Second, based on a comparison of non-normal data transformation methods, use a probit regression 
transformation to produce an approximate normal distribution of data to handle skewness. Third, use 
the elliptical generalized least squares estimation method with non-normal kurtotic data. Fourth, report 
the root mean square error of approximation (RMSEA) and associated confidence interval to test 
hypotheses concerning model fit. And fineilly, when reporting chi-square statistics, conduct the BoUen- 
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Stine bootstrap technique to yield a test of the sufficiency of the obtained model chi-square value and/or 
report the Satorra-Bentler re-scaled chi-square statistic (Chou, Bentler, Satorra, 1991). 
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APPENDIX 



EOS 5.1 Computer Program 



ATTLE 



A comparison of normal and elliptical estimation methods 
/SPECIFICATIONS 

CASES=1000; !(Sample Size: 1000, 5000, or 10000) 

VARIABLES=4; 



METHODS=LS,ELS; ! (Compare normal and elliptical estimation methods) 



MATRIX=RAW; 

ANALYSIS=COVARIANCE; 

/EQUATIONS 

V1=.8F1+E1; 

V2=.8F1+E2; 

V3=.8F2+E3; 

V4=.8F2+E4; 

F2= 1*F1+D2; 

/VARIANCES 



Fl=l; El toE4=*; 

D2=*; 

/SIMULATION 
POPULATION=MODEL; 
REPLICATIONS=100; 
SEED=7896543; 
CONTAMINATION=.05, 1 ; 
/Technical 
eiter = 100;starti=els; 



! (/Equations specify Bentler-Weeks model) 

! (Number of replications =100) 

! (Seed for random number generator) 

! (Population data contamination level= .05, . 10 and 
! k factor = 1 , 2, 3) 

! (Iterations and start values for elliptical estimation) 



/OUTPUT 



parameter estimates;standard errors; 
/END 



NOTE: In Cheevatanarak, S.(1999), the EQS software program was defective leading to incorrect 
tabled values and conclusions. This revised EQS program was used to yield correct tabled values. 




30 



28 



TABLE 1 LS versus ELS method: Contamination = 5%. n =_I00Q 



Contamination 


n k 


X'ls 


X^ELS 


Yls 


Yels 


RMSEA 

LS 


RMSEA 

ELS 


5% 


1000 1 


4.2305 


4.2393 


-.0087 


.0087 


.0315 


.0316 






(2.716) 


(2.731) 


(1.011) 


(1.011) 


(.024) 


(.024) 




2 


9.7972 


7.2139 


-.0184 


.0184 


.0795 


.0663 






(7.045) 


(5.152) 


(1.209) 


(1.209) 


(.037) 


(.032) 




3 


29.6109 


11.3825 


.1250 


-.1250 


.1883 


.1150 






(16.19) 


(2.592) 


(1.531) 


(1.538) 


(.063) 


(.041) 



TABLE 2 LS versus ELS method: Contamination = 5%. n = 5000 



Contamination 


n k 


X'ls 


X^ELS 


Yls 


Yels 


RMSEA 

LS 


RMSEA 

ELS 


5% 


5000 1 


4.1767 


4.1820 


-.0012 


.0012 


.0148 


.0149 






(2.529) 


(2.535) 


(1.008) 


(1.008) 


(.009) 


(.009) 




2 


28.2544 


20.5225 


.0017 


-.0017 


.0688 


.0588 






(12.73) 


(9.193) 


(1.210) 


(1.005) 


(.018) 


(.015) 




3 


138.278 


51.0223 


-.0481 


-.0053 


.1944 . 


.1196 






(44.23) 


(16.968) 


(1.607) 


(1.612) 


(.034) 


(.021) 



TABLE 3 LS versus ELS method: Contamination = 5%. h = 10000 



Cbntamination B X^els Yls Yels RMSEA RMSEA 

LS ELS 



10000 1 


3.8879 


. 3.8868 


.0020 


-.0020 


.0103 


.0103 


(2.451) 


(2.442) 


(1.007) 


(1.007) 


(.007) 


(.007) 


2 


49.7868 


36.2468 


.0001 


-.0001 


.0667 


.0573 




(17.960) 


(12.995) 


(1.206) 


(1.206) 


(.012) 


(.010) 


3 


270.248 


97.9009 


.0139 


-.0139 


.1936 


.1183 




(64.657) 


(23.491) 


(1.601) 


(1.601) 




(.015) 



Note: Standard deviations for chi-squares, parameters, and root mean square of approximation are in 
parentheses in the tables. Results based on 100 replications (r), except when k=3 due to non- 
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convergence (n = 1,000, k = 3, r = 79; n = 5,000, k =3, r = 97; n = 



10,000,k = 3,r = 99). 



TABLE 4 LS versus ELS method: Contamination = 10%. n = 1000 



Contamination 


n 


k 


X'ls 


X^ELS 


Yls 


Yels 


RMSEA 


RMSEA 
















LS 


ELS 


10% 


1000 


1 


4.2305 


4.2393 


- .0087 


.0087 


.0315 


.0316 


(2.716) 


(2.731) 


(1.011) 


(1.011) 


(.024) 


(.024) 






2 


18.1399 


11.9387 


.0086 


-.0086 


.1350 


.1089 








(10.274) 


(6.632) 


(1.398) 


(1.398) 


(.045) 


(.037) 






3 


39.3374 


23.6486 


.5394 


.0272 


.2772 


.2415 








(20.380) 


(10.458) 


(1.783) 


(2.012) 


(.090) 


(.072) 


TABLE 5 LS versus ELS method: Contamination = 10%. n = 5000 


























Contamination 


n 


k 


X'ls 


X^ELS 


Yls 


Yels 


RMSEA 


RMSEA 
















LS 


ELS 


10% 


5000 


1 


4.1767 


4.1820 


-.0012 


.0012 


.0148 


.0149 


(2.529) 


(2.535) 


(1.008) 


(1.008) 


(.009) 


(.009) 






2 


73.1212 


47.4799 


.0084 


-.0084 


.1291 


.1049 








(21.596) 


(14.038) 


(1.005) 


(1.005) 


(.020) 


(.016) 






3 


243.548 


84.0961 


-.4071 


.4071 


.3231 


.1918 








(28.048) 


(10.651) 


(2.142) 


(2.142) 


(016) 


(Oil) 



TABLE 6 LS versus ELS method: contamination = 10%. n = 10000 



Contamination B X^ls X^els Yls Yels RMSEA RMSEA 

LS ELS 



3.8879 


3.8868 


.0020 


-.0020 


.0103 


.0103 


(2.451) 


(2.442) 


(1.007) 


(1.007) 


(.007) 


(.007) 


141.967 


92.1144 


.0032 


-.0032 


.1286 


.1046 


(33.885) 


(21.816) 


(1.406) 


(1.406) 


(.016) 


(.013) 


439.546 


152.562 


- 1.890 


1.890 


.3163 


.1883 
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Contamination 


n 


k X'ls 


X^ELS 


Yls 


Yels 


RMSEA 

LS 


RMSEA 

ELS 






(0.000) 


(0.000) 


(0.000) 


(0.000) 


(0.000) 


(0.000) 



Note: Standard deviations for chi-squares, parameters, and root mean square of approximation are in 
parentheses in the tables. Results based on 100 replications (r), except when k=3 due to non- 
convergence (n = 1,000, k = 3, r = 17; n = 5,000, k =3, r = 5; n = 10,000, k = 3, r = 1). 



TABLE ? GLS versus EGLS method: Contamination = 5%. n = 1000 



Contamination 


n 


k 


X^GLS 


X^EGLS 


Ygls 


Yegls 


RMSEA 


RMSEA 
















GLS 


EGLS 


5% 


1000 


1 


4.2199 


4.2282 


-.0068 


.0068 


.0108 


.0109 


(2.687) 


(2.700) 


(1.005) 


(1.005) 


(.013) 


(.013) 






2 


8.9916 


6.5332 


-.0094 


.0090 


.0291 


.0199 








(6.098) 


(4.379) 


(1.107) 


(1.103) 


(.021) 


(.019) 






3 


28.9084 


10.2459 


-.0112 


.0092 


.0748 


.0359 








(15.008) 


(5.197) 


(1.290) 


(1.264) 


(.025) 


(.017) 


TABLE 8 GLS versus EGLS method: Contamination = 


5%. n = 5000 
























Contamination 


n 


k 


X^GLS 


X^EGLS 


Ygls 


Yegls 


RMSEA 


RMSEA 
















GLS 


EGLS 


5% 


5000 


1 


4.1577 


4.1629 


-.0001 


.0001 


.0046 


.0046 


(2.520) 


(2.526) 


(1.005) 


(1.005) 


(.005) 


(.005) 






2 


25.4624 


18.2719 


.0007 


-.0006 


.0256 


.0134 








(10.827) 


(7.674) 


(1.110) 


(1.106) 


(.007) 


(.019) 






3 


116.339 


39.6596 


.0006 


-.0003 


.0740 


.0417 








(34.803) 


(11.372) 


(1.301) 


(1.275) 


(.011) 


(.006) 



TABLE 9 GLS versus EGLS method: Contamination = 5%. n = 10000 
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Contamination B 


k 


X^GLS 


X^EGLS 


Yols 


Yegls 


RMSEA 

GLS 


RMSEA 

EGLS 


5% 10000 


1 


3.8791 


3.8781 


.0022 


-.0022 


.0027 


.0027 






(2.443) 


(2.434) 


(1.006) 


(1.006) 


(.003) 


(.003) 




2 


44.6443 


32.1243 


-.0006 


.0007 


.0313 


.0260 






(15.073) 


(10.707) 


(1.107) 


(1.103) 


(.005) 


(.005) 




3 


222.340 


76.1177 


-.0050 


.0068 


.0734 


.0422 






(47.854) 


(15.914) 


(1.293) 


(1.005) 


(.008) 


(.004) 


Note: Standard deviations for chi-squares, parameters, and root mean square of approximation are in 


parentheses in the tables based on 100 replications. 










TABLE 10 GLS versus EGLS method: Contamination 


= 10%.n = 


1000 






















Contamination ^ 


k 


X^GLS 


X^EGLS 


Ygls 


Yegls 


RMSEA 


RMSEA 














GLS 


EGLS 


10% 1000 


1 


4.2199 


4.2282 


-.0068 


.0068 


.0108 


.0109 






(2.687) 


(2.700) 


(1.005) 


(1.005) 


(.013) 


(.013) 




2 


16.3307 


10.4576 


-.0134 


.0126 


.0518 


.0362 






(8.802) 


(5.409) 


(1.207) 


(1.197) 


(.020) 


(.018) 




3 


52.5078 


17.0954 


.0203 


-.0028 


.1078 


.0554 






(19.686) 


(6.044) 


(1.548) 


(1.500) 


(.023) 


(.014) 


TABLE 1 1 GLS versus EGLS method: Contamination ^ 


= 10%. n = 


5000 






















Contamination ^1 


k 


X^GLS 


X^EGLS 


Ygls 


Yegls 


RMSEA 


RMSEA 














GLS 


EGLS 


10% 5000 


1 


4.1577 


4.1629 


-.0001 


.0001 


.0046 


.0046 






(2.520) 


(2.526) 


(1.005) 


(1.005) 


(.005) 


(.005) 




2 


62.6112 


39.7008 


.0008 


-.0007 


.0536 


.0418 






(17.123) 


(10.729) 


(1.206) 


(1.196) 


(.007) 


(.006) 




3 


239.777 


75.8848 


.0013 


-.0102 


.1082 


.0597 






(42.825) 


(13.401) 


(1.561) 


(.993) 


(.009) 


(.005) 




TABLE 12 GLS versus EGLS method: Contamination = 10%. n = 10000 
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Contamination 


u k 


X^GLS 


X^EGLS 


Ygls 


Yegls 


RMSEA 

GLS 


RMSEA 

EGLS 


10% 


10000 1 


3.8791 


3.8781 


.0022 


-.0022 


.0027 


.0027 






(2.443) 


(2.434) 


(1.006) 


(1.006) 


(.003) 


(.003) 




2 


120.921 


76.6337 


.0020 


-.0019 


.0537 


.0423 






(26.454) 


(16.420) 


(1.208) 


(1.199) 


(.006) 


(.004) 




3 


476.482 


150.603 


.0020 


-.0016 


.1084 


.0604 






(67.304) 


(20.726) 


(1.565) 




(.007) 


(.004) 



Note: Standard deviations for chi-squares, parameters, and root mean square of approximation are in 
parentheses in the tables based on 100 replications (r), except for n = 1,000, k = 3, r = 98. 

TABLE 13 ML versus ERLS method: Contamination = 5%. n = 1000 



Contamination B 


k 


X^ML 


X^ERLS 


Yml 


Yerls 


RMSE 

■A-ml 


RMSEA 

ERLS 


5% 1000 


1 


4.2436 


4.2473 


-.0068 


.0068 


.0111 


.0111 






(2.733) 


(2.756) 


(1.009) 


(1.009) 


(.013) 


(.013) 




2 


9.6928 


7.4258 


-.0105 


.0102 


.0309 


.0228 






(6.969) 


(5.478) 


(1.118) 


(1.114) 


(.023) 


(.021) 




3 


34.4752 


14.4736 


-.0160 


.0104 


.0820 


.0464 






(20.240) 


(9.083) 


(1.339) 


(1.314) 


(.030) 


(.022) 



TABLE 14 ML versusJRLS method: Contamination = 5%. n = 5000 



Contamination 


n k 


X^ML 


X^ERLS 


Yml 


Yerls 


RMSEA 

ML 


RMSEA 

ERLS 


5% 


5000 1 


4.1790 


4.1919 


-.0001 


.0001 


.0047 


.0047 






(2.532) 


(2.542) 


(1.006) 


(1.006) 


(.005) 


(.005) 




2 


27.5948 


20.926 


.0007 


-.0005 


.0330 


.0278 






(12.300) 


(9.495) 


(1.115) 


(1.111) 


(.009) 


(.008) 




3 


140.185 


56.576 


.0015 


-.0010 


.0813 


.0504 






(47.217) 


(20.010) 


(1.337) 


(1.311) 


(014) 


(.009) 
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TABLE 15 ML versus ERLS method: Contamination = 5% . n = 10000 



Contamination 


n k 


X^ML 


X^ERLS 


Yml 


Yerls 


RMSEA 


RMSEA 














ML 


ERLS 


'5% 


10000 1 


3.8867 


3.8880 


.0022 


-.0022 


.0027 


.0027 


(2.449) 


(2.442) 


(1.006) 


(1.006) 


(.003) 


(.003) 




2 


48.4701 


36.8498 


-.0005 


.0008 


.0327 


.0281 






(17.268) 


(13.391) 


(1.112) 


(1.108) 


(.006) 


(.005) 




3 


268.350 


108.588 


-.0047 


.0050 


.0807 


.0111 






(64.916) 


(27.782) 


(1.328) 


(1.301) 


(.009) 


( 013 ) 



Note: Standard deviations for chi-squares, parameters, and root mean square of approximation are in 
parentheses in the tables based on 100 replications. 



TABLE 16 ML versus ERLS method: Contamination = 10%. n = 1000 



Contamination n 


k 


X^ML 


X^ERLS 


Yml 


Yerls 


RMSEA 

■ML 


RMSEA 

ERLS 


10% 1000 


1 


4.2436 


4.2473 


-.0068 


.0068 


.0111 


.0111 






(2.733) 


(2.756) 


(1.009) 


(1.009) 


(.013) 


(.013) 




2 


18.5216 


13.0090 


-.0159 


.0152 


.0559 


.0428 






(10.869) 


(7.759) 


(1.230) 


(1.221) 


(.023) 


(.021) 




3 


65.9481 


26.8282 


.0514 


.0257 


.1212 


.0726 






(28.302) 


(12.471) 


(1.671) 


(1.636) 


(.028) 


(.021) 



TABLE 17 ML versus ERLS method: Contamination = 10%. n =5000 








Contamination n k X^erls Yml Yerls 


RMSEA RMSEA 

ML ERLS 



4.1790 


4.1919 


-.0001 


.0001 


.0047 


.0047 


(2.532) 


(2.542) 


(1.006) 


(1.006) 


(.005) 


(.005) 


71.6210 


49.9599 


.0006 


-.0007 


.0575 


.0473 


(21.113) 


(15.285) 


(1.229) 


(1.213) 


(.009) 


(.007) 


309.804 


121.616 


.0027 


-.0110 


.1230 


.0762 
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34 



Contamination n k xV X^erls Yml Yerls RMSEA RMSEA 

ML ERLS 



(65.043) (27.830) (1.675) (1.003) (.012) (.008) 

TABLE 18 ML versus ERLS method: Contamination - 10%. n - 10000 



Contamination 


n k 


X'ml 


X^ERLS 


Yml 


Yerls 


RMSEA 

ML 


RMSEA 

ERLS 


10% 


10000 1 


3.8867 


3.8880 


.0022 


-.0022 


.0027 


.0027 






(2.449) 


(2.442) 


(1.006) 


(1.006) 


(.003) 


(.003) 




2 


138.753 


96.809 


.0019 


-.0020 


.0576 


.0478 






(32.933) 


(23.670) 


(1.009) 


(1.215) 


(.007) 


(.006) 




3 


618.058 


242.540 


.0029 


-.0025 


.1235 


.0769 






(102.49) 


(43.120) 


(1.678) 


(1.628) 


(.010) 


(.007) 



Note: Standard deviations for chi-squares, parameters, and root mean square of approximation are in 
parentheses in the tables based on 100 replications (r), except for n=1000, k=3, r = 96. 



Figure 1 Normal Distribution 
Figure 2. Elhptical Distribution 
Figure 3. Bentler-Weeks Stmctural Eqiration Model 
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